{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"text-align:right\"><i>Peter Norvig<br>June 2021</i></div>\n",
    "\n",
    "# Split the States\n",
    "\n",
    "The [538 Riddler for 11 June 2021](https://fivethirtyeight.com/features/can-you-split-the-states/) poses this problem  (paraphrased):\n",
    "\n",
    "> Given a map of the lower 48 states of the  USA, remove a subset of the states so that the map is cut into two  disjoint contiguous regions that are near-halves by area. Call the regions *A* and *B*, where *A* has the larger area. You can treat Michigan’s upper and lower peninsulas as two non-adjacent \"states,\" for a total of 49. \n",
    ">\n",
    "> To be precise, 538's question is: \n",
    "> 1. What states should you remove to maximize the area of *B*? What is *B*'s area and percent of the country's area?\n",
    ">\n",
    "> There is some ambiguity in the phrase \"near-halves by area\" and [Philip Bump](https://twitter.com/pbump/status/1400185939629117442) is interested in another question:\n",
    ">\n",
    "> 2. What states should you remove to minimize the difference of the area of *A* and the area of *B*? \n",
    ">\n",
    "> Bump hypothesized that {IL, MO, OK, NM} is the best subset to remove. Is he right?\n",
    "\n",
    "# Vocabulary terms \n",
    "\n",
    "Let's start by clarifying some concepts:\n",
    "- **State**: denoted by the standard 2-letter abbreviations like `'CA'` (plus `'UP'` and `'LP'` for the Michigan peninsulas). \n",
    "- **States**: a set of states; implemented as a (hashable) `frozenset`. I'll use `states('OR CA')` for  `frozenset({'OR', 'CA'})`.\n",
    "- **Region**: a set of states that are **contiguous**—they are all connected by a single tree of **neighbor** relations.\n",
    "- **Neighbor**: a relation saying two states share a border. Implemented as [adjacency sets](https://en.wikipedia.org/wiki/Adjacency_list) in the dict `neighbors`.\n",
    "- **Cut**: a set of states that, when removed from the map, cuts the map into disjoint regions. \n",
    "- **Split**: a tuple `(A, B, C)`, where  `A` and `B` are the two regions made by the cut `C`, with `A` larger than `B`.\n",
    "- **Border**: the states on the edge of the map (neighbors of Canada, Mexico, Atlantic, or Pacific).\n",
    "- **Area**: Each state has an area in square miles, given by the `areas` dict, and a region has a total area, given by the function `area`.\n",
    "\n",
    "\n",
    "Code to implement the concepts:   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import *\n",
    "\n",
    "State = str\n",
    "States = Region = frozenset # Hashable set of states\n",
    "\n",
    "Split = Tuple[Region, Region, Region] # (A, B, C) = (large region, small region, cut)\n",
    "\n",
    "def states(string) -> States: \"Set of states\"; return States(string.split())\n",
    "\n",
    "def area(states) -> int: \"Total area\"; return sum(areas[s] for s in states)\n",
    "\n",
    "neighbors = dict( # https://theincidentaleconomist.com/wordpress/list-of-neighboring-states-with-stata-code/\n",
    "    AK=states(''), AL=states('FL GA MS TN'), AR=states('LA MO MS OK TN TX'), AZ=states('CA CO NM NV UT'), \n",
    "    CA=states('AZ NV OR'), CO=states('AZ KS NE NM OK UT WY'), CT=states('MA NY RI'), DC=states('MD VA'), \n",
    "    DE=states('MD NJ PA'), FL=states('AL GA'), GA=states('AL FL NC SC TN'), HI=states(''), \n",
    "    IA=states('IL MN MO NE SD WI'), ID=states('MT NV OR UT WA WY'), IL=states('IA IN KY MO WI'), \n",
    "    IN=states('IL KY LP MI OH'), KS=states('CO MO NE OK'), KY=states('IL IN MO OH TN VA WV'), \n",
    "    LA=states('AR MS TX'), MA=states('CT NH NY RI VT'), MD=states('DC DE PA VA WV'), ME=states('NH'), \n",
    "    MI=states('IN OH WI'), MN=states('IA ND SD WI'), MO=states('AR IA IL KS KY NE OK TN'), \n",
    "    MS=states('AL AR LA TN'), MT=states('ID ND SD WY'), NC=states('GA SC TN VA'), ND=states('MN MT SD'), \n",
    "    NE=states('CO IA KS MO SD WY'), NH=states('MA ME VT'), NJ=states('DE NY PA'), NM=states('AZ CO OK TX UT'), \n",
    "    NV=states('AZ CA ID OR UT'), NY=states('CT MA NJ PA VT'), OH=states('IN KY LP MI PA WV'), \n",
    "    OK=states('AR CO KS MO NM TX'), OR=states('CA ID NV WA'), PA=states('DE MD NJ NY OH WV'), \n",
    "    RI=states('CT MA'), SC=states('GA NC'), SD=states('IA MN MT ND NE WY'), TN=states('AL AR GA KY MO MS NC VA'), \n",
    "    TX=states('AR LA NM OK'), UT=states('AZ CO ID NM NV WY'), VA=states('DC KY MD NC TN WV'), \n",
    "    VT=states('MA NH NY'), WA=states('ID OR'), WI=states('IA IL MI MN UP'), WV=states('KY MD OH PA VA'), \n",
    "    WY=states('CO ID MT NE SD UT'), UP=states('WI'), LP=states('IN OH'))\n",
    "\n",
    "areas = dict( # https://www.census.gov/geographies/reference-files/2010/geo/state-area.html\n",
    "    AK=665384, AL=52420,  AZ=113990, AR=53179, CA=163695, CO=104094, CT=5543,  DE=2489,   DC=68, \n",
    "    FL=65758,  GA=59425,  HI=10932,  ID=83569, IL=57914,  IN=36420,  IA=56273, KS=82278,  KY=40408, \n",
    "    LA=52378,  ME=35380,  MD=12406,  MA=10554, MI=96714,  MN=86936,  MS=48432, MO=69707,  MT=147040, \n",
    "    NE=77348,  NV=110572, NH=9349,   NJ=8723,  NM=121590, NY=54555,  NC=53819, ND=70698,  OH=44826, \n",
    "    OK=69899,  OR=98379,  PA=46054,  RI=1545,  SC=32020,  SD=77116,  TN=42144, TX=268596, UT=84897, \n",
    "    VT=9616,   VA=42775,  WA=71298,  WV=24230, WI=65496,  WY=97813,  UP=16377, LP=80337)  \n",
    "\n",
    "# Borders:\n",
    "north  = states('WA ID MT ND MN WI MI UP IL IN LP OH PA NY VT NH ME') \n",
    "south  = states('CA AZ NM TX LA MS AL FL')                   \n",
    "west   = states('WA OR CA')\n",
    "east   = states('ME NH MA RI CT NY NJ DE MD VA NC SC GA FL')\n",
    "border = north | south | west | east\n",
    "\n",
    "# \"Countries\":\n",
    "usa50 = States(areas) - states('DC UP LP')           # 50 actual US states\n",
    "usa48 = States(areas) - states('AK HI DC UP LP')     # lower 48 states\n",
    "usa49 = States(areas) - states('AK HI DC MI')        # lower 49 \"states\": MI split into UP, LP\n",
    "four  = states('UT CO AZ NM')                        # The \"four corners\" states\n",
    "western = states('WA OR CA ID NV UT AZ MT WY CO NM') # The 11 states west of the Rockies"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Strategy for answering the two questions\n",
    "\n",
    "My overall strategy:\n",
    "- Generate a large number of **cuts**. \n",
    "- For each cut *C*, determine the **split** into regions *A* and *B*. Discard cuts that don't produce exactly two regions.\n",
    "- Find the split that **maximizes** the area of *B*, and the split that **minimizes** the difference in area of *A* and *B*.\n",
    "\n",
    "# Making cuts\n",
    "\n",
    "Can we generate all possible cuts? A cut is a subset of the 49 states, so there are 2<sup>49</sup> or 500 trillion possible cuts, so **no**.  \n",
    "\n",
    "I have four ideas to limit the number of cuts:\n",
    "- **Make cuts small in number of states.** I'll consider subsets of up to 8 states.  That reduces the possibilites a million-fold to 500 million.\n",
    "- **Make cuts small in area.** A large area in the cut means there won't be much area left to make region *B* big. I'll limit the cut area.\n",
    "- **Make cuts contiguous.** Noncontiguous cuts can't be optimal for question 1, so I won't consider them.\n",
    "- **Make cuts go border-to-border.** A cut can produce exactly two regions only if (a) the cut runs from one place on the border to another place on the border or (b) the cut forms a \"donut\" that surrounds some interior region. The US map isn't big enough to support a decent-sized donut (there are only 14 non-border states, and only two of those, KS and NE, are not neighbors of a border state). Therefore I'll look only for cuts that go border-to-border. \n",
    "\n",
    "The function `make_cuts` generates  all contiguous regions of up to `maxsize` states (by default, 8)  and up to `maxarea` area (by default, twice the area of the {IL, MO, OK, NM} cut) that contain one of the `start` states (by default, the states on the north border), as well as one of the `end` states (by default, the states on the south border). \n",
    "\n",
    "Most of the work is done by the function `contiguous_regions`, which starts by building a set of regions where each region contains a single `start` state. Then in each iteration of the `while` loop, it yields each region from  the current set of regions, and creates a new set of regions formed by adding a neighboring state to a current region in all possible ways, as long as the area does not exceed `maxarea` and the size does not exceed `maxsize`. (On each iteration all the regions have the same size.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "maxarea = 2 * area(states('IL MO OK NM'))\n",
    "\n",
    "def make_cuts(country, maxsize=8, maxarea=maxarea, start=north, end=south) -> Iterator[Region]:\n",
    "    \"\"\"All regions up to `maxsize` and up to `maxarea` that contain a `start` and `end` state.\"\"\"\n",
    "    return filter(end.intersection, contiguous_regions(country, maxsize, maxarea, start))\n",
    "        \n",
    "def contiguous_regions(country, maxsize, maxarea, start) -> Iterator[Region]:\n",
    "    \"\"\"Contiguous regions up to `maxsize` and `maxarea` that contain one of `start`.\"\"\"\n",
    "    regions = {Region({s}) for s in start & country} \n",
    "    while regions:\n",
    "        yield from regions \n",
    "        regions = {region | {s1}\n",
    "                   for region in regions if len(region) + 1 <= maxsize \n",
    "                   for s in region for s1 in neighbors[s] \n",
    "                   if s1 in country and s1 not in region\n",
    "                   and area(region) + areas[s1] <= maxarea} "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, the north-south cuts of size up to 3 through the western states:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{frozenset({'ID', 'NM', 'UT'}),\n",
       " frozenset({'CA', 'ID', 'NV'}),\n",
       " frozenset({'CA', 'ID', 'OR'}),\n",
       " frozenset({'AZ', 'ID', 'NV'}),\n",
       " frozenset({'CA', 'OR', 'WA'}),\n",
       " frozenset({'AZ', 'ID', 'UT'})}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "set(make_cuts(western, 3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Making splits\n",
    "\n",
    "Now, given some cuts, the function `make_splits` creates **splits**: tuples `(A, B, C)`, where `A` and `B` are the two regions defined by the cut `C`, with `A` larger than `B`. A split requires that the cut divides the country into exactly two regions, but not all cuts do that.  For example, the cut {WA, OR, CA} leaves only one region (consisting of all the other states). The cut {ID, OR, NV, AZ} leaves three regions: {WA}, {CA}, and {everything else}. To verify whether the cut makes two regions, `make_splits` first finds a maximal-size contiguous region `A` from the non-cut states, then finds a region `B` from the remaining states, then checks that `A` and `B` are non-empty and make up all the non-cut states.\n",
    "\n",
    "We find the extent of a region with the `floodfill` [algorithm](https://en.wikipedia.org/wiki/Flood_fill): it maintains a mutable `region` and a `frontier` of the states that neighbor the region but are not in the region. We iterate adding the frontier into the region and computing a new frontier until there is no new frontier. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def make_splits(country, cuts) -> Iterator[Split]:\n",
    "    \"\"\"For each cut C, find regions A and B and yield the Split (A, B, C) if valid.\"\"\"\n",
    "    for C in cuts:\n",
    "        noncut = country - C\n",
    "        A = floodfill(noncut) \n",
    "        B = floodfill(noncut - A)\n",
    "        if A and B and A | B == noncut:\n",
    "            B, A = sorted([A, B], key=area) \n",
    "            yield (A, B, C)\n",
    "            \n",
    "def floodfill(legal: States) -> Region:\n",
    "    \"\"\"Starting at one state, fill out to all legal contiguous states.\"\"\"\n",
    "    region   = set() \n",
    "    frontier = {min(legal)} if legal else None\n",
    "    while frontier:\n",
    "        region |= frontier\n",
    "        frontier = {s1 for s in frontier for s1 in neighbors[s]\n",
    "                    if s1 in legal and s1 not in region}\n",
    "    return Region(region)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, the splits (*A*, *B*, *C*) of the western states from cuts *C* of size up to 3:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{(frozenset({'CO', 'MT', 'NM', 'WY'}),\n",
       "  frozenset({'CA', 'NV', 'OR', 'WA'}),\n",
       "  frozenset({'AZ', 'ID', 'UT'})),\n",
       " (frozenset({'CO', 'MT', 'NM', 'UT', 'WY'}),\n",
       "  frozenset({'CA', 'OR', 'WA'}),\n",
       "  frozenset({'AZ', 'ID', 'NV'})),\n",
       " (frozenset({'AZ', 'CO', 'MT', 'NM', 'UT', 'WY'}),\n",
       "  frozenset({'OR', 'WA'}),\n",
       "  frozenset({'CA', 'ID', 'NV'})),\n",
       " (frozenset({'AZ', 'CO', 'MT', 'NM', 'NV', 'UT', 'WY'}),\n",
       "  frozenset({'WA'}),\n",
       "  frozenset({'CA', 'ID', 'OR'}))}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "set(make_splits(western, make_cuts(western, 3)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The answers\n",
    "\n",
    "The function `answers` puts it all together: makes cuts; makes splits from those cuts; finds the splits that answer the two questions; and depending on the value of the parameter `do`, either prints information in a pretty format, or returns the four values, or both. By default, just print."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def answers(country, maxsize=8, maxarea=maxarea, start=north, end=south, do='print') -> Optional[tuple]:\n",
    "    \"\"\"Find the splits that answer the 2 questions.\n",
    "    Print information in pretty format if 'print' is a substring of `do`.\n",
    "    Return the tuple (cuts, splits, answer1, answer2) if 'return' is a substring of `do`.\"\"\"\n",
    "    cuts    = list(make_cuts(country, maxsize, maxarea, start, end))\n",
    "    splits  = list(make_splits(country, cuts))\n",
    "    answer1 = max(splits, key=lambda s: area(s[1]))\n",
    "    answer2 = min(splits, key=lambda s: area(s[0]) - area(s[1]))\n",
    "    if 'print' in do:\n",
    "        print(f'{len(country)} states ⇒ {len(cuts):,d} cuts',\n",
    "              f'(maxsize ≤ {maxsize}, area ≤ {maxarea:,d}) ⇒ {len(splits):,d} splits.')\n",
    "        show('1. Split that maximizes area(B)', country, answer1)\n",
    "        show('2. Split that minimizes ∆ = area(A) - area(B)', country, answer2)\n",
    "    if 'return' in do:\n",
    "        return cuts, splits, answer1, answer2\n",
    "    \n",
    "def show(title, country, split):\n",
    "    \"\"\"Print a title, and a summary of the split in four rows. The columns shown are:\n",
    "    'region name|area|percent of country area|number of states in region|states in region'.\n",
    "    The ∆ row is not a region; it is the difference in area between A and B.\"\"\"\n",
    "    A, B, C = split\n",
    "    def print_row(name, region, sqmi): \n",
    "        statelist = f'{len(region):2d}|{\" \".join(sorted(region))}' if region else ''\n",
    "        print(f'{name}|{sqmi:9,d}|{sqmi/area(country):5.1%}|{statelist}')\n",
    "    print(f'\\n{title}:')\n",
    "    print_row('A', A,  area(A))\n",
    "    print_row('B', B,  area(B))\n",
    "    print_row('C', C,  area(C))\n",
    "    print_row('∆', '', area(A) - area(B))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "49 states ⇒ 43,901 cuts (maxsize ≤ 8, area ≤ 638,220) ⇒ 14,149 splits.\n",
      "\n",
      "1. Split that maximizes area(B):\n",
      "A|1,345,558|43.1%|29|AL AR CT DE FL GA IN KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TN TX VA VT WV\n",
      "B|1,344,149|43.1%|15|AZ CA IA ID MN MT ND NV OR SD UP UT WA WI WY\n",
      "C|  430,653|13.8%| 5|CO IL MO NE NM\n",
      "∆|    1,409| 0.0%|\n",
      "\n",
      "2. Split that minimizes ∆ = area(A) - area(B):\n",
      "A|1,267,033|40.6%|14|AZ CA IA ID MN MT ND NV OR UP UT WA WI WY\n",
      "B|1,266,994|40.6%|27|AL AR CT DE FL GA KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TX VA VT WV\n",
      "C|  586,333|18.8%| 8|CO IL IN MO NE NM SD TN\n",
      "∆|       39| 0.0%|\n"
     ]
    }
   ],
   "source": [
    "answers(usa49)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are maps of the two answer splits (courtesy [electoralvotemap.com](https://electoralvotemap.com/)):\n",
    "\n",
    "1. The cut {CO, IL, MO, NE, NM} gives a *B* (to the west) with area 1,344,149, or 43.1% of the lower 48: \n",
    "\n",
    "![map3.png](map3.png)\n",
    "\n",
    "2. The cut {CO, IL, IN, MO, NE, NM, SD, TN} gives two regions that differ in area by only 39 square miles. You can think of this as starting from the cut in (1) and then cutting SD from the western region and TN and IN from the eastern region, to even out the areas:\n",
    "\n",
    "![map4.png](map4.png)\n",
    "\n",
    "Both these results are better than the 4-state cut {IL, MO, OK, NM}. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Adding DC\n",
    "\n",
    "Would anything change if we made DC a state (besides, obviously, the voting rights of the citizens)?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "50 states ⇒ 45,810 cuts (maxsize ≤ 8, area ≤ 638,220) ⇒ 14,137 splits.\n",
      "\n",
      "1. Split that maximizes area(B):\n",
      "A|1,345,626|43.1%|30|AL AR CT DC DE FL GA IN KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TN TX VA VT WV\n",
      "B|1,344,149|43.1%|15|AZ CA IA ID MN MT ND NV OR SD UP UT WA WI WY\n",
      "C|  430,653|13.8%| 5|CO IL MO NE NM\n",
      "∆|    1,477| 0.0%|\n",
      "\n",
      "2. Split that minimizes ∆ = area(A) - area(B):\n",
      "A|1,267,062|40.6%|28|AL AR CT DC DE FL GA KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TX VA VT WV\n",
      "B|1,267,033|40.6%|14|AZ CA IA ID MN MT ND NV OR UP UT WA WI WY\n",
      "C|  586,333|18.8%| 8|CO IL IN MO NE NM SD TN\n",
      "∆|       29| 0.0%|\n"
     ]
    }
   ],
   "source": [
    "answers(usa49 | {'DC'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The cuts are the same, but for question 2, adding DC's 68 square miles to the eastern region means that it is now only 29 square miles larger than the western region (previously it was 39 square miles smaller)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reuniting Michigan\n",
    "\n",
    "What if Michigan counts as one state, rather than two separate penninsulas? What if we then also add in DC?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "48 states ⇒ 43,941 cuts (maxsize ≤ 8, area ≤ 638,220) ⇒ 19,811 splits.\n",
      "\n",
      "1. Split that maximizes area(B):\n",
      "A|1,348,646|43.2%|30|AL CT DE FL GA IA IL IN KY MA MD ME MI MN MT NC ND NH NJ NY OH PA RI SC SD TN VA VT WI WV\n",
      "B|1,341,666|43.0%|12|AZ CA CO KS LA NM NV OK OR TX UT WA\n",
      "C|  430,048|13.8%| 6|AR ID MO MS NE WY\n",
      "∆|    6,980| 0.2%|\n",
      "\n",
      "2. Split that minimizes ∆ = area(A) - area(B):\n",
      "A|1,267,816|40.6%|13|AZ CA ID KS MN MT ND NE NV OR SD UT WA\n",
      "B|1,267,672|40.6%|28|AL AR CT DE FL GA IL IN KY LA MA MD ME MI MS NC NH NJ NY OH PA RI SC TN TX VA VT WV\n",
      "C|  584,872|18.7%| 7|CO IA MO NM OK WI WY\n",
      "∆|      144| 0.0%|\n"
     ]
    }
   ],
   "source": [
    "answers(usa48)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "49 states ⇒ 45,856 cuts (maxsize ≤ 8, area ≤ 638,220) ⇒ 19,860 splits.\n",
      "\n",
      "1. Split that maximizes area(B):\n",
      "A|1,348,714|43.2%|31|AL CT DC DE FL GA IA IL IN KY MA MD ME MI MN MT NC ND NH NJ NY OH PA RI SC SD TN VA VT WI WV\n",
      "B|1,341,666|43.0%|12|AZ CA CO KS LA NM NV OK OR TX UT WA\n",
      "C|  430,048|13.8%| 6|AR ID MO MS NE WY\n",
      "∆|    7,048| 0.2%|\n",
      "\n",
      "2. Split that minimizes ∆ = area(A) - area(B):\n",
      "A|1,267,816|40.6%|13|AZ CA ID KS MN MT ND NE NV OR SD UT WA\n",
      "B|1,267,740|40.6%|29|AL AR CT DC DE FL GA IL IN KY LA MA MD ME MI MS NC NH NJ NY OH PA RI SC TN TX VA VT WV\n",
      "C|  584,872|18.7%| 7|CO IA MO NM OK WI WY\n",
      "∆|       76| 0.0%|\n"
     ]
    }
   ],
   "source": [
    "answers(usa48 | {'DC'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The results are not as good (I think because splitting MI allows IL and IN to be north border states rather than interior states)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Four-state cuts\n",
    "\n",
    "If we are restricted to four-state cuts, the proposed {IL, MO, OK, NM} cut is indeed best:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "49 states ⇒ 61 cuts (maxsize ≤ 4, area ≤ 638,220) ⇒ 45 splits.\n",
      "\n",
      "1. Split that maximizes area(B):\n",
      "A|1,607,869|51.5%|18|AZ CA CO IA ID KS MN MT ND NE NV OR SD UP UT WA WI WY\n",
      "B|1,193,381|38.2%|27|AL AR CT DE FL GA IN KY LA LP MA MD ME MS NC NH NJ NY OH PA RI SC TN TX VA VT WV\n",
      "C|  319,110|10.2%| 4|IL MO NM OK\n",
      "∆|  414,488|13.3%|\n",
      "\n",
      "2. Split that minimizes ∆ = area(A) - area(B):\n",
      "A|1,607,869|51.5%|18|AZ CA CO IA ID KS MN MT ND NE NV OR SD UP UT WA WI WY\n",
      "B|1,193,381|38.2%|27|AL AR CT DE FL GA IN KY LA LP MA MD ME MS NC NH NJ NY OH PA RI SC TN TX VA VT WV\n",
      "C|  319,110|10.2%| 4|IL MO NM OK\n",
      "∆|  414,488|13.3%|\n"
     ]
    }
   ],
   "source": [
    "answers(usa49, 4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Achieving equality on question 2\n",
    "\n",
    "Can we find regions with *exactly* equal areas? The function `make_equals` generates contiguous regions (up to maxsize 10), keeping track of the areas, and when it finds a second disjoint region with the same area, it yields the two regions with their area:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "def make_equals(country, maxsize=10) -> Iterator[Tuple[int, Region, Region]]:\n",
    "    \"\"\"Yield (area, A, B) for disjoint regions A, B up to `maxsize` with exactly equal area.\"\"\"\n",
    "    table = {} # {area: region_with_that_area}\n",
    "    for region in contiguous_regions(country, maxsize, area(country) / 2, country):\n",
    "        a = area(region)\n",
    "        if a in table and separated(region, table[a]):\n",
    "            yield (a, region, table[a])\n",
    "        table[a] = region\n",
    "    \n",
    "def separated(A, B) -> bool: \n",
    "    \"\"\"Are regions A and B disjoint with no shared border?\"\"\"\n",
    "    return A.isdisjoint(B) and all(neighbors[a].isdisjoint(B) for a in A)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "543"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "equals = list(make_equals(usa49, 10))\n",
    "len(equals)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From `equals` we can find the two equi-area regions with largest area:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Split with ∆ = 0:\n",
      "A|  874,595|28.0%|10|AL FL GA KS LA MS NC NM OK TX\n",
      "B|  874,595|28.0%|10|CA IA ID IL IN MT ND NV SD WA\n",
      "C|1,371,170|43.9%|29|AR AZ CO CT DE KY LP MA MD ME MN MO NE NH NJ NY OH OR PA RI SC TN UP UT VA VT WI WV WY\n",
      "∆|        0| 0.0%|\n"
     ]
    }
   ],
   "source": [
    "(a, A, B) = max(equals)\n",
    "show('Split with ∆ = 0', usa49, (A, B, usa49 - A - B))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are probably larger regions with equal area, but it would take longer to search for them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Achieving optimality\n",
    "\n",
    "A difference in area of ∆ = 0 is obviously an optimal answer to question 2.\n",
    "\n",
    "How about question 1? We arbitrarily limited cuts to 8 states, going from the north to south border. To prove that we have the best cut, we'll have to eliminate those constraints, allowing cuts of any number of states going between any border states. This will increase run time, probably by an order of magnitude. Fortunately, we can tighten the area constraint tobring that down a bit. We found that the cut {CO, IL, MO, NE, NM} produces a region *B* with area 1,344,149, so that means that any cut that is better for question 1 must create a split where the areas of both *A* and *B* are greater than 1,344,149, Therefore, we can lower `maxarea` from 638,220 to:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "432062"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "area(usa49) - 2 * 1344149"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "49 states ⇒ 547,779 cuts (maxsize ≤ 49, area ≤ 432,062) ⇒ 42,685 splits.\n",
      "\n",
      "1. Split that maximizes area(B):\n",
      "A|1,345,558|43.1%|29|AL AR CT DE FL GA IN KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TN TX VA VT WV\n",
      "B|1,344,149|43.1%|15|AZ CA IA ID MN MT ND NV OR SD UP UT WA WI WY\n",
      "C|  430,653|13.8%| 5|CO IL MO NE NM\n",
      "∆|    1,409| 0.0%|\n",
      "\n",
      "2. Split that minimizes ∆ = area(A) - area(B):\n",
      "A|1,345,558|43.1%|29|AL AR CT DE FL GA IN KS KY LA LP MA MD ME MS NC NH NJ NY OH OK PA RI SC TN TX VA VT WV\n",
      "B|1,344,149|43.1%|15|AZ CA IA ID MN MT ND NV OR SD UP UT WA WI WY\n",
      "C|  430,653|13.8%| 5|CO IL MO NE NM\n",
      "∆|    1,409| 0.0%|\n"
     ]
    }
   ],
   "source": [
    "answers(usa49, maxsize=49, maxarea=432062, start=border, end=border)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This confirms that {CO, IL, MO, NE, NM} is indeed the optimal cut for question 1.\n",
    "\n",
    "(*Note:* the `answers` computation above took about 40 seconds, and the  `make_equals` computation before that took about 50 seconds. All the other cells in this notebook take under 5 seconds. You can use the IPython `%%time` magic directive if you want to see for yourself, or if you want to try to modify the functions to achieve better efficiency.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Unit tests\n",
    "\n",
    "Here are some unit tests; they also serve as examples of input/output of the various functions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'ok'"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def test():\n",
    "    assert states('AZ CA OR') == frozenset({'AZ', 'CA', 'OR'})\n",
    "\n",
    "    assert len(usa48) == 48 and len(usa49) == 49 and len(usa50) == 50\n",
    "    assert len(western) == 11 and len(four) == 4\n",
    "    \n",
    "    assert set(areas) == set(neighbors)\n",
    "    assert areas['MI'] == areas['UP'] + areas['LP'] \n",
    "    assert area(states('AZ CA OR')) == area(states('IA IL KY NE SD VA WV')) == 376_064\n",
    "\n",
    "    assert all((x in neighbors[y]) == (y in neighbors[x]) \n",
    "               for x in neighbors for y in neighbors)\n",
    "    \n",
    "    assert set(make_cuts(western, 3)) == {\n",
    "        states('AZ ID NV'),\n",
    "        states('CA ID OR'),\n",
    "        states('CA OR WA'),\n",
    "        states('CA ID NV'),\n",
    "        states('ID NM UT'),\n",
    "        states('AZ ID UT')}\n",
    "\n",
    "    assert set(make_splits(western, make_cuts(western, 3))) == {\n",
    "        (states('CO MT NM WY'), states('CA NV OR WA'), states('AZ ID UT')),\n",
    "        (states('CO MT NM UT WY'), states('CA OR WA'), states('AZ ID NV')),\n",
    "        (states('AZ CO MT NM UT WY'), states('OR WA'), states('CA ID NV')),\n",
    "        (states('AZ CO MT NM NV UT WY'), states('WA'), states('CA ID OR'))}\n",
    "\n",
    "    assert set(contiguous_regions(four, 4, maxarea, four)) == {\n",
    "        states('UT'), states('CO'), states('AZ'), states('NM'),\n",
    "        states('AZ CO'), states('AZ NM'), states('CO NM'), states('NM UT'), states('AZ UT'), states('CO UT'),\n",
    "        states('AZ CO UT'), states('AZ CO NM'), states('CO NM UT'), states('AZ NM UT'),\n",
    "        states('AZ CO NM UT')}\n",
    "\n",
    "    assert floodfill(western - states('AZ ID NV')) == states('CA OR WA')\n",
    "    \n",
    "    for country in (usa48, usa49, border, western, four):\n",
    "        assert floodfill(country) == country\n",
    "        \n",
    "    assert set(make_equals(usa49, 5)) == {\n",
    "        (207853, states('LP MD OH PA WV'), states('IA MO UP WI')),\n",
    "        (258498, states('AL KY MO NC TN'), states('ID SD WY'))}\n",
    "        \n",
    "    assert not separated(south, states('GA SC'))\n",
    "    assert separated(north, south)\n",
    "    \n",
    "    return 'ok'\n",
    "               \n",
    "test()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
